Relational RSS Clustering Techniques

نویسنده

  • Richard Roesler
چکیده

1. INTRODUCTION There has been an explosion in the amount of available news and current event information on the Internet. 24-hours news outlets and independent bloggers alike flood the wires with a constant stream of data. Though an increasing number of people rely on the Internet as their primary source of news and current events, it is becoming increasingly difficult for users to find what they are looking for. Real Simple Syndication (RSS) provided a means to bring the data directly to the user through easily parsable XML feeds. Of course this only changed how users looked for news, not how much news there was. RSS aggregators, likes Google News, were created to learn a user's preferences and only display RSS stories that a user would like. Google News, in particular, has found a great deal of success applying algorithms from the text clustering community toward RSS data. Users are able to choose a story and the search engine provides a list of related stories that the user (hopefully) finds interesting. The potential applications of RSS clustering, however, go well beyond simple aggregation. In the future, Semantic-Web technologies may be able to actually extract knowledge from a stream of RSS feeds, i.e. find similarities, detect patterns, and infer things that the best human analysts cannot detect. This type of application is well beyond the scope of this project. Instead, we focus on aspect of such an application: sorting large amounts of RSS data. We present here the results of applying standard clustering techniques to the RSS problem, and an analysis of how well these techniques work. In Section 2 we describe some of the unique features that separate RSS clustering from other text clustering applications. The methods and results of our application are presented in Section 3, along with an extensive analysis of these results. We finish with some suggestions to improve future applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Utilizing phrase-similarity measures for detecting and clustering informative RSS news articles

As the number of RSS news feeds continue to increase over the Internet, it becomes necessary to minimize the workload of the user who is otherwise required to scan through huge number of news articles to find related articles of interest, which is a tedious and often an impossible task. In order to solve this problem, we present a novel approach, called InFRSS, which consists of a correlation-b...

متن کامل

Suffix Tree Based Chinese Document Feature Extraction and Clustering in RSS Aggregator

In RSS aggregator, the important issue is how to make the feeds information more manageable for RSS subscriber. In this paper, we propose a suffix tree based RSS feeds document clustering in Chinese RSS aggregator. We construct a suffix tree with meaningful Chinese words, and choose the phrases with high score given by a formula as document features. We cluster document using group-average algo...

متن کامل

A Hybrid Grey based Two Steps Clustering and Firefly Algorithm for Portfolio Selection

Considering the concept of clustering, the main idea of the present study is based on the fact that all stocks for choosing and ranking will not be necessarily in one cluster. Taking the mentioned point into account, this study aims at offering a new methodology for making decisions concerning the formation of a portfolio of stocks in the stock market. To meet this end, Multiple-Criteria Decisi...

متن کامل

Indoor Positioning and Pre-processing of RSS Measurements

Rapid expansions of new location-based services signify the need for finding accurate localization techniques for indoor environments. Among different techniques, RSS-based  schemes and in particular oneof its variants which is based on Graph-based Semi-Supervised Learning (G-SSL) are widely-used approaches The superiority of this scheme is that it has low setup/training cost and at the same ti...

متن کامل

Synthesizing correlated RSS news articles based on a fuzzy equivalence relation

Tens of thousands of news articles are posted on-line each day, covering topics from politics to science to current events. To better cope with this overwhelming volume of information, RSS (news) feeds are used to categorize newly posted articles. Nonetheless, most RSS users must filter through many articles within the same or different RSS feeds to locate articles pertaining to their particula...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009